Cargando…

A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition

BACKGROUND: Human Papillomavirus (HPV) genotyping is an important approach to fight cervical cancer due to the relevant information regarding risk stratification for diagnosis and the better understanding of the relationship of HPV with carcinogenesis. This paper proposed two new feature extraction...

Descripción completa

Detalles Bibliográficos
Autores principales: Tanchotsrinon, Watcharaporn, Lursinsap, Chidchanok, Poovorawan, Yong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4375884/
https://www.ncbi.nlm.nih.gov/pubmed/25880169
http://dx.doi.org/10.1186/s12859-015-0493-4
_version_ 1782363647432982528
author Tanchotsrinon, Watcharaporn
Lursinsap, Chidchanok
Poovorawan, Yong
author_facet Tanchotsrinon, Watcharaporn
Lursinsap, Chidchanok
Poovorawan, Yong
author_sort Tanchotsrinon, Watcharaporn
collection PubMed
description BACKGROUND: Human Papillomavirus (HPV) genotyping is an important approach to fight cervical cancer due to the relevant information regarding risk stratification for diagnosis and the better understanding of the relationship of HPV with carcinogenesis. This paper proposed two new feature extraction techniques, i.e. ChaosCentroid and ChaosFrequency, for predicting HPV genotypes associated with the cancer. The additional diversified 12 HPV genotypes, i.e. types 6, 11, 16, 18, 31, 33, 35, 45, 52, 53, 58, and 66, were studied in this paper. In our proposed techniques, a partitioned Chaos Game Representation (CGR) is deployed to represent HPV genomes. ChaosCentroid captures the structure of sequences in terms of centroid of each sub-region with Euclidean distances among the centroids and the center of CGR as the relations of all sub-regions. ChaosFrequency extracts the statistical distribution of mono-, di-, or higher order nucleotides along HPV genomes and forms a matrix of frequency of dots in each sub-region. For performance evaluation, four different types of classifiers, i.e. Multi-layer Perceptron, Radial Basis Function, K-Nearest Neighbor, and Fuzzy K-Nearest Neighbor Techniques were deployed, and our best results from each classifier were compared with the NCBI genotyping tool. RESULTS: The experimental results obtained by four different classifiers are in the same trend. ChaosCentroid gave considerably higher performance than ChaosFrequency when the input length is one but it was moderately lower than ChaosFrequency when the input length is two. Both proposed techniques yielded almost or exactly the best performance when the input length is more than three. But there is no significance between our proposed techniques and the comparative alignment method. CONCLUSIONS: Our proposed alignment-free and scale-independent method can successfully transform HPV genomes with 7,000 - 10,000 base pairs into features of 1 - 11 dimensions. This signifies that our ChaosCentroid and ChaosFrequency can be served as the effective feature extraction techniques for predicting the HPV genotypes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0493-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4375884
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43758842015-03-28 A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition Tanchotsrinon, Watcharaporn Lursinsap, Chidchanok Poovorawan, Yong BMC Bioinformatics Methodology Article BACKGROUND: Human Papillomavirus (HPV) genotyping is an important approach to fight cervical cancer due to the relevant information regarding risk stratification for diagnosis and the better understanding of the relationship of HPV with carcinogenesis. This paper proposed two new feature extraction techniques, i.e. ChaosCentroid and ChaosFrequency, for predicting HPV genotypes associated with the cancer. The additional diversified 12 HPV genotypes, i.e. types 6, 11, 16, 18, 31, 33, 35, 45, 52, 53, 58, and 66, were studied in this paper. In our proposed techniques, a partitioned Chaos Game Representation (CGR) is deployed to represent HPV genomes. ChaosCentroid captures the structure of sequences in terms of centroid of each sub-region with Euclidean distances among the centroids and the center of CGR as the relations of all sub-regions. ChaosFrequency extracts the statistical distribution of mono-, di-, or higher order nucleotides along HPV genomes and forms a matrix of frequency of dots in each sub-region. For performance evaluation, four different types of classifiers, i.e. Multi-layer Perceptron, Radial Basis Function, K-Nearest Neighbor, and Fuzzy K-Nearest Neighbor Techniques were deployed, and our best results from each classifier were compared with the NCBI genotyping tool. RESULTS: The experimental results obtained by four different classifiers are in the same trend. ChaosCentroid gave considerably higher performance than ChaosFrequency when the input length is one but it was moderately lower than ChaosFrequency when the input length is two. Both proposed techniques yielded almost or exactly the best performance when the input length is more than three. But there is no significance between our proposed techniques and the comparative alignment method. CONCLUSIONS: Our proposed alignment-free and scale-independent method can successfully transform HPV genomes with 7,000 - 10,000 base pairs into features of 1 - 11 dimensions. This signifies that our ChaosCentroid and ChaosFrequency can be served as the effective feature extraction techniques for predicting the HPV genotypes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0493-4) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-05 /pmc/articles/PMC4375884/ /pubmed/25880169 http://dx.doi.org/10.1186/s12859-015-0493-4 Text en © Tanchotsrinon et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Tanchotsrinon, Watcharaporn
Lursinsap, Chidchanok
Poovorawan, Yong
A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition
title A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition
title_full A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition
title_fullStr A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition
title_full_unstemmed A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition
title_short A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition
title_sort high performance prediction of hpv genotypes by chaos game representation and singular value decomposition
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4375884/
https://www.ncbi.nlm.nih.gov/pubmed/25880169
http://dx.doi.org/10.1186/s12859-015-0493-4
work_keys_str_mv AT tanchotsrinonwatcharaporn ahighperformancepredictionofhpvgenotypesbychaosgamerepresentationandsingularvaluedecomposition
AT lursinsapchidchanok ahighperformancepredictionofhpvgenotypesbychaosgamerepresentationandsingularvaluedecomposition
AT poovorawanyong ahighperformancepredictionofhpvgenotypesbychaosgamerepresentationandsingularvaluedecomposition
AT tanchotsrinonwatcharaporn highperformancepredictionofhpvgenotypesbychaosgamerepresentationandsingularvaluedecomposition
AT lursinsapchidchanok highperformancepredictionofhpvgenotypesbychaosgamerepresentationandsingularvaluedecomposition
AT poovorawanyong highperformancepredictionofhpvgenotypesbychaosgamerepresentationandsingularvaluedecomposition